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1.  Introduction 

Empirical  multiple  time  series  analysis  is  concerned  with  finding 
relations  among  r  time  series  X^( • ) ,  .  ..,  X,(>),  given  finite  samples 


(1)  (XL(t),  t  =1,2,...,T],  ...,  {Xr(t),  t=l,2,...,T}  . 

Multiple  time  series  modelling  could  be  equivalently  defined  as  multi¬ 
variate  analysis  of  a  sample  of  dependent  (rather  than  independent) 
random  vectors 


(2) 


X(t) 


\(ty 

X  (t) 
r ' 


¥e  call  X(“)  =  (X(t),  t=0,  +  1,  +  2,  ...)  a  multiple  time  series. 

The  point  of  view  that  a  multiple  time  series  is  a  series  of 
vectors  (rather  than  a  vector  of  series)  seems  useful  for  mathematical 
statistical  investigations  of  the  distribution  of  various  sample 
statistics.  Point  One  of  this  paper  is:  for  pre-mathematical  statistical 
investigations  of  the  specification  of  the  models  to  be  fitted  it  may  be 
essential  to  first  model  each  component  by  itself. 

This  paper  seeks  to  provide  a  general  framework  for  the  theory 
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and  practice  of  multivariate  analysis  of  time  series.  It  seeks  to 
compare : 

(1)  Spectral  approaches  to  finding  relations  among  time  series. 

(2)  Time  domain  or  innovations  approaches  to  finding  relations 
among  time  series . 

The  paper  also  seeks  to  focus , attention  On: 

(3)  Innovations  approaches  to  cross-spectral  estimation. 

(4)  The  problem  of  multivariate  analysis  of  the  joint  innovations 
covariance  matrix  and  the  sampling  properties  of  its  estimators. 

The  various  sections  are  entitled:  2.  Innovation  Approaches  to 
Modelling,  3*  Spectral  Approaches  to  Modelling,  4.  Relations 
Between  Time  Series,  5*  Autoregressive  Approach  to  a  Single  Series, 

6.  Multiple  Spectral  Density  Estimation. 
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2 .  Innovation  Approaches  to  Modelling 

When  we  admit  the  possibility  that  our  vector  samples 
X(l),  .  ..,  X(T)  are  not  independent,  and  seek  to  build  stochastic 
dynamic  models,  the  statistical  inference  problem  could  be  conceived  as 
one  of  estimating 


(1) 

(2) 


m  (t)  =  E[X  (t) ]  , 

J  J 


K.k(s,t)  =  Cov [Xj ( s ) ,  Xk(t)]  ; 


these  means  and  covariances  specify  the  probability  law  of  the  observa¬ 
tions  when  it  is  assumed  to  be  multivariate  normal.  In  order  that  there 
not  be  as  many  or  more  parameters  as  observations,  one  must  assume 

models  which  restrict  m.{0  and  K. ,(•.•)>  since  otherwise  statistical 

J  jkv 

inference  is  impossible. 

A  multiple  time  series  X(°)  is  called  covariance  stationary 

[see  Parzen  (1962),  Chapter  3]  if  for  each  index  j  and  k  there  is  a 

function  R.,  (v)  of  v  =  0,  +  1,  ...  such  that 
Jk  — 


(3) 


The  r  by  r  matrix 


cov[x.(s),  xk(t)3  =  RJk(t~s) 


00  R(v)  = 


Rll(v)  •••  Rix(v) 


Rrl(v)  . . .  Rrr(v) 


Rh1(v)  =  Cov[Xh(t),  X  (t+v) ] 


is  called  the  covariance  matrix  R( • )  of  the  covariance  stationary 
multiple  time  series  (X(t),  t  =  1,  2,  . ..,  T). 

The  sample  statistics  appropriate  for  inferring  models  for 
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covariance  stationary  time  series  are  often  interpretable  even  for  non¬ 
stationary  time  series  [either  as  time  varying  statistic s,  as  in 
Priestley  (1965)*  or  through  transformations  to  stationarity,  as  in 
Parzen  (1967a)  or  Whittle  (1963b)].  Therefore  we  assume  that  the  multiple 
time  series  X(  •)  being  discussed  is  covariance  stationary. 

Let  us  review  briefly  models  for  a  univariate  stationary  time 
series  X(*)j  its  covariance  function  R(v)  has  spectral  representation 

P  ^ 

(5)  R(v)  =  /  cos  vo)  dF(w) 

J  -it 

When  seeking  to  model  a  time  series  X( • )  with  given  covariance  func¬ 
tion  R( .)  and  spectral  distribution  function  F(*)>  in  principle  one 
may  treat  separately  the  three  types  of  distribution  functions  into 
which  F(-)  may  be  decomposed: 

(6)  F(oi)  =  Fd(u)  +  Fa(>)  +  Fsc(w)  ) 

in  words,  F( • )  is  the  sum  of  three  distribution  functions  which  are, 
respectively,  discrete  (or  purely  discontinuous),  absolutely  continuous, 
and  singular  continuous . 

Observed  time  series  are  assumed  to  have  a  mixed  spectrum,  in  the 
sense  that:  (i)  the  singular  continuous  part  of  the  spectral  distribu¬ 
tion  function  vanishes,  (ii)  the  discrete  part  has  only  a  finite  number 
of  jumps,  and  (iii)  the  absolutely  continuous  part  has  a  spectral  density 
function  f(co)  satisfying 

it 

log  f(to)dco  >  ~  00  . 

7t 


r 


u 


(7) 


Note  that  f(io)  is  an  even  non-negative  function  such  that 

r  u 

(8)  F  (u>)  =  /  f  (to1  )dto'  . 

J  -Jt 

We  call  m(t)  =  E[x(t)  ]  the  mean  value  function  of  X(  • )  .  It  may¬ 
be  shown  that  X(t)  -  m(t)  may  be  written  as  the  sum  of  two  time  series. 


X(t)  -  m(t )  =  Xd(t)  +  Xc(t) 


satisfying 


X,(t)  =  A  +  Y  {A. cos  X.t  +  B.sin  \.t] 

d '  0  A,  j  j  j  j 


where  A.,  A.,  B.  are  uncorrelated  random  variables  and  are  fre- 

0  J  J  J 

quencies  in  the  band  OCX.  <  jt,  while 

J 

(11)  X  (t)  =  T](t)  +  b.  Tl(t-l)  +  b0  T!(t-2)  +  ... 


where  (b  ,}  are  constants  such  that  £  b  ,  <  oo  and  (ti(v)}  are 
uncorrelated  random  variables. 

The  probability  distribution  of  A_,  A.,  B.  cannot  be  estimated 

from  a  single  realization  of  X(-),  or  even  of  Xd(»);  all  one  can  hope 

to  estimate  is  the  value  of  these  variables  in  the  realization  observed. 

Thus  A,.,  A.,  B.  can  be  treated  as  constants  (rather  than  random 
0  j’  3  - 

variables)  for  purposes  of  statistical  inference  and  X^(")  can  be 
treated  as  part  of  the  mean  value  function  of  X(‘).  The  mean  value 
function  of  X(-)  has  to  be  eliminated  by  some  detrending  procedure 
(which  could  involve  spectral  analysis)  in  order  to  do  statistical 
inference  on  Xc(")>  the  "fluctuation"  part  of  X(»). 

Point  Two  of  this  paper  is:  in  multiple  time  series  modelling  we 
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can  assume  that  we  are  dealing  with  zero  mean  jointly  covariance 

stationary  time  series  X.(*),  each  satisfying  (7)  and  therefore 

3 

satisfying  a  model  of  the  form 

(12)  X  (t)  =  T)  (t)  +  b(j)  T)  (t-1)  +  h^j)  Ti  (t-2)  +  ...  . 

To  understand  the  meaning  of  the  random  variables  q.(t),  let  us 

J 

hereafter  consider  normal  time  series  (X(t)).  One  can  associate  to  a 
univariate  series  X(«)  a  series  of  successive  conditional  expectations 
(or  minimum  mean  square  error  predictors) 

(13)  X*(t)  =  E[X(t) Ix(t-l) ,  X(t-2),  ...]  , 

and  conditional  variances  (or  mean  square  prediction  errors) 

(14)  or l  =  Var[x(t)|x(t-l),  X(t-2),  ...]  =  E[|x(t)  -  X*(t)|2]  . 

For  non-normal  time  series,  the  notion  of  projection  is  used  in  place  of 
conditional  expectation;  see  Rozanov  ( 1967 )  . 

The  one  step  prediction  error,  denoted 

(15)  T)( t )  -  X(t)  -  X*(t)  or  X(t)  =  X(t)  -  X*(t)  , 

is  called  the  innovation  at  time  t.  The  successive  innovations  r|(t) 
are  a  sequence  of  uncorrelated  (independent  when  X(*)  is  normal,  as 
assumed  here)  random  variables, 

(16)  E[t)(s)  r](t )  ]  =0  if  s  f  t 

An  uncorrelated  sequence  r|(«)  is  called  white  noise;  if  all  variances 
are  equal  it  is  called  stationary  white  noise. 
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Writing  X(t)  as  an  infinite  series  in  r)(t),  rj(t-l),  ...  is  one 
way  of  expressing  the  time  series  X(*)  as  the  output  of  a  filter  whose 
input  is  white  noise  rj (  • ) : 

*)(•) 

By  representing  a  time  series  as  the  output  of  a  filter  whose  input  is 
innovation  white  noise  we  are  able  to  conveniently  solve  estimation 
(prediction,  signal  extraction)  problems  and  simulation  problems  for  the 
time  series. 

For  a  univariate  time  series  X(  • )  which  is  assumed  to  be  normal, 

covariance  stationary,  have  zero  means,  and  non-deterministic  [in  the 

sense  that  it  satisfies  model  (ll)],  the  modelling  problem  can  be  solved 

by  estimating  either: 

(i)  its  covariance  function  R(v),  or 

(ii)  its  spectral  density  function  f(w),  or 

■^2 

(iii)  its  innovation  variance  cr  and  the  filter  $[•]  which 
transforms  rj (  ° )  to  X(  • )  . 

Point  Three  of  this  paper  is:  approach  (iii)  is  the  most  satis¬ 
factory  for  two  reasons:  (l)  as  the  answer  we  seek  since  it  is  the  most 
convenient  form  for  prediction  and  control,  (2)  as  the  most  suitable 
means  of  obtaining  (ii)  [the  details  of  how  to  do  this  are  discussed  in 
Section  5] • 

Point  Four  of  this  paper  is:  for  a  multiple  stationary  normal  time 
series  two  types  of  innovation  approaches  to  modelling  can  be  considered 
(i)  the  individual  innovation  approach,  and 
(ii)  the  joint  innovation  approach. 


Filter  0 


X(-)  -  ®[x(.)] 
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The  individual  innovations,  denoted  rj.(t),  are  defined  in  terms  of 

J 

the  predictors  of  each  series  given  its  own  past 


(IT)  X*(t)  =  E[X  (t) |x  (s),  s  <  t] 

J  J  J 

by 

(18)  ri  (t)  -  x  (t)  -  x*(t)  . 

J  J  J 


Equation  (12)  is  a  representation  of  X.(t)  in  terms  of  its  individual 

J 

innovations . 


The  joint  innovations,  denoted  T]  .  (t),  are  defined  in  terms  of  the 

'  J 

predictors^  denoted  X.(t),  of  each  series  given  the  pasts  of  all  of 

J 

them;  in  symbols 


(19)  X*(t)  =  E[X.(t)|x.  (s),  s  <  t  and  k  =  1,  2,  r] 

J 

and 

(20)  rj  (t)  =  X  (t)  -  X*(t)  . 

—  J  J  J 

The  joint  innovation  multiple  time  series,  denoted  T](°)j  and 
defined  by  T)(«)*  =  (Tl^(’)>  •••>  T|  r(  • ) )  >  is  multiple  white  noise  in  the 
sense  that 

(21)  E[r)(  s )  r] !  (t )  ]  =  0  for  s  J  t  , 


and  therefore  is  described  by  the  innovation  covariance  matrix 


(22)  %  =  E[-n(t)  V(t)]  . 

The  joint  innovation  approach  models  X(»)  by  estimating:  (i)  the 
innovation  covariance  matrix  X,  and  (ii)  the  multi  input-  multi  output 
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filter  which  transforms  the  joint  innovations  r)(*)  to  the  observed 
multiple  time  series  X(°)* 

The  individual  innovation  approach  models  X(-)  by  estimating 
(i)  each  individual  innovation  series  rj.(*)  and  the  filter  transforming 
rj  (•)  to  Xj(-)j  (ii)  the  multiple  time  series  [denoted  ^^^(t)  and 
defined  by  ~  ( " )  ■»  Tlr  (  *  ))3  of  individual  innovations  in 

terms  of  their  joint  innovations  [to  be  denoted  e_(  • )  and  called  the 
innovation  innovations ]  and  the  multi  input  -multi  output  filter  which 
transforms  e_(  • )  to  r)  ( 0 )  . 

Point  Five  of  this  paper  is:  to  estimate  the  joint  innovation 
structure  of  a  multiple  time  series  we  fit  it  by  a  sufficiently  long 
joint  autoregressive  scheme.  Probabilistic  justification  of  such  fits 
is  provided  by  the  work  of  Masani  (see,  for  example,  Section  1J  of 
Masani’s  review  paper  (19 66)  at  the  First  Symposium  on  Multivariate 
Analysis) . 

A  zero  mean  covariance  stationary  multiple  time  series  X(  • )  is 

called  a  joint  autoregressive  scheme  of  order  m  if  the  infinite 

* ,  . 

memory  predictor  X  (t)  can  be  expressed  as  a  linear  combination  of 
X(t-l),  X(t-m) : 


(25) 


X  (t)  =  A( l)  X(t-l)  +  ...  +  A(m)  X(t-m) 


When  the  multiple  time  series  X(«)  is  known  to  be  a  joint  autoregressive 
scheme  of  order  m,  the  autoregressive  matrix  coefficients  A(l),  .  ..,  A(m) 
are  estimated  from  a  sample  (X(t),  t  =  1,  2,  ...,  T]  of  size  T  by 

/S  ✓>. 

the  solutions  A(l),  A(m)  of  a  system  of  equations  called  the 

multiple  Yule-Walkei  equations: 
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m  ^ 

(24)  Y.  A(j)  £r(j-k)  =  R^C-k)  ,  k  =  1,  2,  ...,  m  , 

j=l 

■where  R^(v)  is  "the  sample  covariance  matrix  defined  in  the  next 
section. 

The  Yule-Walker  equations  are  suggested  by  the  fact  that  the  matrix 
coefficients  A(l),  A(m)  of  the  finite  memory  predictor  in  (25) 

satisfy  the  normal  equations 

(25)  E [X* ( t )  X'(t-k))  =  E[X(t)  X' (t-k)]  ,  k  =  1,  2,  . ..,  m 

or 

m 

(26)  Y  A(j)  R(«3“k)  =  R(-k)  ,  k  =  1,  2,  m  . 

j=l 

The  prediction  error  covariance  matrix  X  satisfies 

*  m 

(27)  Z  =  [(X(t)  -X  (t) }  X’(t)J  =  R(0)  -  £  A(j)  R(j)  j 

j=l 

the  natural  estimator  of  X  is 

(28)  X  =  R  (0)  -  Y  A(j)  Rjj)  ■ 

j=l 

^  ^  Z 

It  is  important  to  note  that  the  computation  of  A(l),  . ..,  A(m),  X  is 
most  conveniently  done  recursively  [as  in  Whittle  (1965),  Jones  (1964), 
or  Robinson  (1967)]. 

From  the  work  of  Wold  (1954),  Mann  and  Wald  (1944)  and  Whittle  (1955) 
we  know  the  properties  of  the  autoregressive  coefficient  estimators 
A(l),  . ..,  A(m)  -j  indeed,  their  properties  are  very  similar  to  those  of 
estimators  of  multivariate  regression  coefficients  [as  given,  for  example, 
in  Kendall  and  Stuart  (1966),  p.  275)* 
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What  has  not  been  explicitly  proved  in  the  literature,  but  seems 
plausible  (and  basic  to  the  innovation  approach  to  modelling)  is  that 

✓N 

(T-m)  Z  is  approximately  distiibuted  as  the  Wishart  distribution  of 
dimension  r,  degrees  of  freedom  T-m,  and  covariance  matrix  In 

Section  3,  we  will  make  the  point  that  multivariate  analysis  of  Z  is 
an  important  tool  of  multiple  time  series  modelling. 
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3 .  Spectral  Approaches  to  Modelling 


The  spectral  approach  to  multiple  stationary  time  series  analysis 
assumes  that  each  component  is  non-deterministic,  and  in  addition  assumes 
that  the  covariance  matrix  R(  . )  is  absolutely  summable  in  the  sense 
that 

(1)  £  lRhj(v)  I  <  °°  for  h,  j  =1,  2,  r  , 

so  that  an  explicit  formula  can  be  given  for  the  spectral  density  matrix 

i  00 

(2)  £(“>  =  ^  E  * 

Y=-oo 

or 


(3)  f(u)  = 


'fn(w)  •••  fix(u) 


•  o  »  •  • 


f  _(o>)  ...  f  (u) 
rl  rrv  ' 


f.  .(u) 

hj 


in  terms  of  f  ((*)),  we  have  a  Fourier  representation  for  R(v) : 

(4)  R(v)  =  T  eivU  f(co)dw  . 

J  -it 

One  calls  f(w)  the  spectral  density  matrix  of  the  covariance  stationary 
multiple  time  series  X(*)j  for  further  discussion  see  Rozanov  (1967), 
Granger  (1964),  Jenkins  and  Watts  (1968). 

Point  Six  of  this  paper  is:  there  are  three  kinds  of  sample  statis¬ 
tics  in  multiple  time  series  modelling,  which  one  should  use  simultan¬ 
eously  and  between  which  one  should  know  how  to  transform  quickly.  The 
three  kinds  of  sample  statistics  are: 

(i)  the  sample  covariance  matrix, 
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(ii)  a  matrix  of  estimated  spectral  densities , 

(iii)  various  innovations  and  sample  autoregressive  coefficients. 
The  sample  covariance  matrix  is  defined  by 


1  T-v 

(5)  =  I  E  X(t)  X'(t+v)  ,  for  v  =  0,  1,  ....  T-l  . 

t=l 


When  the  multiple  time  series  X(*)  has  zero  means  and  is  covariance 
stationary,  one  regards  R<j(v)  as  an  estimate  of  the  value  of  the 
covariance  matrix  R(v)  .  Since  R(-v)  =R'(v)  we  define  R^,(-v)  =  R’(v) . 

It  should  be  noted  that  in  the  course  of  our  discussion  of 
estimated  spectral  densities,  it  will  be  seen  that  in  practice  one  should 
rarely  use  (5)  to  compute  R,j(v),  and  ope  should  not  usually  use  (5) 
to  even  define  R^,(v)  for  ''detrended"  time  series. 

While  the  sample  covariance  matrix  R^v)  is  of  interest  to  deter¬ 
mine  the  lags  v  at  which  the  components  of  R^v)  are  significantly 
non-zero,  for  time  series  modelling  Rrp(*)  needs  to  be  transformed 
either  to  spectral  density  estimates  or  autoregressive  coefficient 
estimates . 


A  matrix  _f(w)  of  estimated  spectral  densities 


f.  .(w) 

hj' 


is  denoted 


(6) 


’  f(«) 


fll(u)  •••  fLrM 

f2L(“)  t22H  •••  f2r(w) 

•  «  o  e  «  •  o*o  ••• 

fr2  ^ ’  ’  °  ^rr^^ 


Point  Seven  of  this  paper  is:  the  spectral  approach  to  multivariate 
analysis  of  time  series  may  be  defined  to  be  concerned  with  the  relations 
among  time  series  that  can  be  inferred  statistically  from  the  matrix  of 
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estimated  spectral  densities  [as  well  as  probabilistically  from  the 
matrix  of  true  spectral  densities)  for  example,  see  Koopmans  (1964)]. 

/S 

We  briefly  describe  ways  of  forming  estimated  spectra  f(to) 

[for  a  tabular  presentation  of  these  remarks  see  Parzen  (1968)].  First, 
form  the  sample  Fourier  transform 

(T)  ZM  =  £  e-1UtX(t) 

t=l 

at  specified  frequencies  cj  (we  prefer  0,  —  (2T-1)  ^~)  .  Second, 

at  these  frequencies  form  the  sample  spectral  density  matrix  by  the 
formula 

(8)  fT  (w)  -  ~~  Z(w)  Z:(-u,) 

which  satisfies  the  relations 

(9)  Rip  (v)  =  r  e1VW  f^cojdw  , 

J  -Jt 

(10)  £T(W)  =  Z  R^v)  . 

|v|  <  T 

Third,  form  estimators  of  f(w)  by  averaging  adjacent  values  of  f^w); 
this  averaging  process  is  computationally  faster  if  the  averages  one 
considers  are  of  the  form 

(ID  £(k-~)  =  2  ^((j-k)  fT(j  , 

which  we  call  filtered  sample  spectral  density  functions. 

An  alternative  third  step,  which  seems  to  me  the  most  convenient  (and 
because  of  Fast  Fourier  Transform  techniques  perhaps  faster)  way  to 


ll 


compute  f( •),  is  via  the  method  of  covariance  averages  [compare  Parzen 
( 1967b )  or  Jenkins  ( 1967 ) ] 

(12)  £(“>  =  £  E  e'iv“  VT) 

!  v  i  <  T 

in  terms  of  a  suitable  kerne 1  k(°)  and  constant  M  called  the 

truncation  point.  We  prefer  (12)  to  (ll)  since  one  can  readily  compute 
Rrp(v)  for  v  =  0,  lj  . ..,  T  -  1  (through  the  Fast  Fourier  Transform)  by 
the  formula 

2T-1 

(1-3)  Rt  (v)  =  §§  Y,  e*p(ivk  |^)  f.T  (k  . 

k  =  0 

For  the  proof  of  (13),  compare  Gentleman  and  Sande  (1966),  p.  573* 

To  interpret  estimated  spectra,  one  has  to  take  account  of  both 
their  variability  and  their  bias.  The  basic  approximation  on  variability 
[which  was  first  noted  by  Goodman  (1963)  and  proved  by  Wahba  (1968)  and 
Brillinger  ( 1970) 3  is  that  an  estimated  spectral  density  matrix  f(w) 
of  form  (12),  or  equivalently  (ll),  has  the  following  approximate 

/N 

distribution:  v  f(w)  has  a  complex  Wishart  distribution  of  dimension 
r,  degrees  of  freedom  V,  and  covariance  matrix  f(to),  where 


By  identifying  the  distribution  of  f(u>)  with  the  Wishart  distribution, 

one  reduces  to  standard  problems  of  multivariate  analysis  the  problem 

/\ 

of  finding  the  distribution  of  various  statistics  derived  from  f.(w)  . 

To  conclude  this  section,  we  note  that  the  foregoing  computational 
path  seems  especially  appropriate  when  one  cannot  assume  the  observed 
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time  series  to  have  zero  mean  (no  trend) ,  and  desires  to  detrend  the 


series  X( * )  by  passing  each  component  X.(«)  through  a  filter  to 

d 

form  a  detrended  series  .X.(»): 

d  o' 


(15) 


n 


x  (t)  =  V  x  (t-a)  w  (a)  . 

a  J  J  J 


However  the  Fourier  transform  ^Z  (•)  can  be  formed  [without  first 
foiming  ,X(')]  directly  from  the  Fourier  transform  Z(-)  by 


(16)  Z  (to)  =  Z  (to)  ¥  (to) 

^  tJ  tJ  d 

where 

(17)  W  (to)  =  £w  (a)  exp(itoa) 

<]  a  J 

Further  to  form  dX£  ° )  as  a  function  of  time  one  need  only  invert  the 

Fourier  transform  of  ,X(“)«  The  sample  spectral  density  matrix  and 

■ - - : — ~  a*" 

sample  covariance  matrix  of  a  detrended  multiple  time  series  ^X(') 
seem  to  me  to  be  best  computed  by 

U8>  £t<“>  =  Srd5.(“)  a5.'(-“) 


and  (13)  respectively.  I  must  admit  that  as  yet  I  have  no  practical 
experience  with  comparisons  of  formula  (l8)  with  more  "direct"  methods 
of  calculation. 
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4.  Relations  Between  Time  Series 


Given  two  jointly  stationary  zero  mean  multiple  time  series  X(“) 
and  Y( 0 )  there  are  a  variety  of  relation  filters.  To  regard  X( • )  as 
the  independent  variable  is  to  regard  it  as  the  input  of  a  filter  whose 
output ,  denoted  Y  (“)j>  provides  a  representation  of  Y(»)  as  a  sum  of 
two  terms ; 

(1)  Y(t)  =  Y* (t)  +  e(t)  ,  e(t)  =  Y(t)  -  Y*(t)  . 

Point  Eight  of  this  paper  is:  it  is  most  meaningful  to  define 
Y  (t)  as  a  minimum  mean  square  error  linear  predictor  of  Y(t)  given 
a  specified  past  of  the  X(')  series  [for  example  (X(s),  s=0,+l,  +  2,. 
or  (X(s) ,  s  <  t} ] o  In  other  words,  E[|Y(t)  -  Y  (t) |  ]  is  a  minimum 
among  all  possible  linear  functionals  of  the  specified  values  of'  X(*)* 
Then  e( • )  is  the  error  series,  characterized  by  the  normal  property 
(for  each  t  =  0,  +  1,  +  2,  o.o) 

(2)  E[X(s)  e’(t)]  =  0  for  all  indices  s  such  that 

X(s)  is  part  of  the  memory 
used  to  form  Y  (t). 

In  addition  to  specifying  the  past  of  X( ’ )  used  to  form  Y  (t)  as  a 
linear  predictor  of  Y(t) ,  one  may  specify  "matrix  restraints"  on  the 
form  of  Y  (t)  of  the  type  considered  by  Brillinger  (1969)  in  his  paper 
at  this  symposium. 

The  system  which  transforms  X( • )  to  Y  (•)  is  a  filter  which 
when  X(°)  an(^  Y(*)  are  jointly  covariance  stationary  is  time 
invariant.  The  spectral  representation  of  this  filter  is  a  matrix 
function  of  to,  denoted  Byt(w)  and  called  the  filter  transfer  function 


IT 


best  described  by  assuming  that  the  filter  is  an  infinite  moving 
average 

(3)  Y*(t)  =  £  £(k)  x(t-k) 

k=-oo 

where  (|3(k) ,  k  =  0,  +  1,  . ..)  is  a  sequence  of  matrices  called  the 
filter  response  function.  The  filter  transfer  function  is  defined  by 

0° 

(4)  By*(“)  .=  £  e"ia)k  p(k)  . 

k= 

The  relation  between  Y(*)  and  X( ’ )  is  resolved  into  a 
deterministic  dynamic  system  represented  by  the  relation  filter  with 
filter  transfer  function  and  a  stochastic  driving  function 

represented  by  the  error  series  £_(•)  with  spectral  density  matrix 
denoted  by  fy*(co) . 

When  Y  (t)  is  a  function  of  all  values  of  X(*)  in  the  sense 

that 


(5)  Y*(t)  =  E[Y(t)|x(s),  s  =  0,  +  1,  ...]  , 

we  call  f  *(u)  the  partial  spectral  density  matrix  of  Y(-)>  given 

x(  •■)  • 

Point  Wine  of  this  paper  is:  the  spectral  theory  of  multivariate 
analysis  of  time  series  has  been  mainly  concerned  with  finding: 

(i)  formulas  for  B  y*(w)  and  fy*  (to)  in  terms  of  the  joint  spectral 
density  matrix  of  the  X(°)  and  Y(»)  multiple  time  series 


(6) 


f(w)  = 


txx(M)  £xy(w) 


YY' 
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and  (ii)  the  sampling  properties  of  the  natural  estimators  B 

/s 

and  f  *(to)  formed  from  an  estimated  spectral  density  matrix 


(7) 


£xyw 

-XX^ 

/s 

-XI  ^ 

/s 

f  ( to) 

— YY'  _ 

under  the  assumption  that  for  a  suitable  number  V  of  degrees  of  free- 
/\ 

dom  V  f(to)  obeys  a  complex  Wishart  distribution  of  V  degrees  of 
freedom  and  covariance  matrix  .f(to) . 

For  example }  by  the  usual  matrix  pivoting  procedures  used  to  solve 
normal  equations one  can  transform  [see  Parzen  (1967b)]  an  estimated 
spectral  density  matrix  (6)  to  a  partitioned  matrix 


(8) 


sh(“> 

—  S\ 


where 


(9) 


A 

?■£*(“) 


=  fYX(“) 

=  fyyO) 
=  %<“> 


“  A 

“  lxy(W) 

-  £j>>  4c(“) 


are  natural  estimators  of  the  regression  transfer  function  and  error 

spectrum  respectively  for  Y  (•)  defined  by  (5)$  for  the  multivariate 

analogue  of  (9)  see  Anderson  (1958)= 

The  work  that  has  been  done  on  estimating  relations  between  time 

series  in  terms  of  B  *(co)  and  f_  *(w)  leaves  open  a  number  of  prob- 

x  ^"Y 

lems  and  issues  which  it  is  the  major  aim  of  this  paper  to  point  out: 
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(1)  One  would  like  to  describe  in  the  time  domain  the  filter 
which  Bv*(to)  estimates  in  the  frequency  domain;  one  way  of  doing 
this  is  to  write 

n 

(10)  By*(u)  =  t  P(k)e"iWk  +  {By*(w)  -  By(co)  ) 

k=  -m 

where  m,  n,  and  £(k)  are  to  be  estimated  and  the  "errors" 

/N 

B  #(w)  -  B^(to)  are  approximately  normal  with  zero  means  and  asymptotic 
variances  that  can  be  estimated;  often  the  "errors"  at  different  fre¬ 
quencies  can  be  shown  to  be  asymptotically  independent.  Pioneering  and 
elegant  work  on  this  problem  has  been  done  by  E,  J.  Hannan  [see  his 
papers,  Hannan  ( 1963 ) ,  (1965),  (1967),  Hamon  and  Hannan  (1963)  ].  From 
(10)  one  estimates  the  coefficients  P(k)  by  regression  analysis. 

(2)  One  would  like  to  describe  (model)  in  terms  of  a  time  domain 
filter  with  white  noise  input  the  error  series  £.(•)  whose  spectral 

/N 

density  matrix  fy*(w)  is  estimated  by  f  *(w) . 

(3)  The  sampling  theory  of  the  usual  spectral  estimators  (namely, 
smoothed  sample  spectral  densities)  is  based  entirely  on  variability 
theory  [for  example,  Rosenblatt  (1959)  and  Goodman  (1963)]  and  ignores 
the  fact  that  estimation  of  cross-spectra  by  the  usual  method  of 
smoothed  sample  spectral  densities  is  subject  to  serious  bias  errors 
[Akaike  and  Yamanouchi  (1961),  Nettheim  (1966),  Parzen  (1967b),  Tick 
(1967)].  I  believe  that  it  can  be  shown  that  spectral  estimators 
which  have  "minimum"  bias  and  variability  can  be  found  by  fitting  long 
enough  autoregressive  schemes.  We  indicate  in  this  paper  several 
"autoregression  approaches"  to  cross-spectral  estimation  and  to  fitting 
time  domain  models  to  time  series. 
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(4)  The  relations  between  time  series  which  can  be  inferred  from 
estimated  spectra  are  not  "causal"  unless  the  relations  are  between 
time  series  physically  measured  at  the  input  and  output  respectively  of 
a  causal  filter.  Causal  relations  can  be  fitted  only  through  "innovations" 
which  can  be  found  by  fitting  long  autoregressive  schemes.  In  other 
words,  for  finding  relations  between  two  arbitrary  time  series,  spectral 
methods  suffer  from  the  drawback  that  they  work  directly  only  for 
predictors  whose  memory  involves  the  future  as  well  as  the  past.  They 
cannot  easily  be  used  to  estimate  the  error  spectrum,  and  (more  importantly) 
the  filter  transfer  function  from  X(°)  to  Y  (•)>  for  cases  such  as 
the  following: 

Y*(t)  =  E[Y(t)lx(s),  s<t], 

(11)  Y*(t)  =  E[Y(t)|x(s),  s  <  t  and 

Y*(t)  =  E[Y(t)|x(s),  set  and 

Autoregressive  methods  seem  to  provide  directly  estimators  of  these 
semi-infinite  memory  predictors. 

To  the  multiple  time  series 


Y(s),  s<t]  , 
Y(s) ,  s  <  t ]  . 


(12) 


'  x(t)* 
.  Y(t). 


one  can  fit  a  sufficiently  long  autoregressive  scheme 


(13) 


'  x(t)" 

'x(t-i)' 

X(t-m) 

.  I(t)_ 

-  A(l) 

+  . . .  +  A(m) 

Y(t-1)_ 

_Y(t-m)_ 

+  T)(  t ) 
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where  rj(t)  is  multiple  white  noise.  Writing 


w 


A(j) 


AXX^  AXY^ 
^(j)  Ayy^j) 


■n(t) 


\(t)' 

l)y(t) 


one  obtains  a  relation  between  the  X( * ) 


and  Y( • )  series : 


Y(t)  -  ^(1)  Y(t-l)  -  ...  -  A^m)  Y(t-m) 

(15) 

=  AYX(  l)  X(t-l)  -  ...  -  A^m)  X(t_m)  +  TlyCl)  • 

We  next  show  how  to  add  X(t)  to  the  relation  (15).  Given  x(t), 
and  the  past  of  X( 1 )  and  Y(«)  up  to  t  -  1,  one  can  form 

nx(t)  =  x(t)  -  x* (t) 

(16)  -  x(t)  -  An(l)  X(t-l)  -  •••  -  A^m)  X(t-m) 

-  AXy ( l)  Y(t-l)  -  ...  -  A^m)  Y(t-m)  . 

Next,  from  T)  (t)  and  the  innovation  covariance  matrix  X  -one  can 
form  a  predictor  r^(t)  of  ^(t).  To  write  T)*(t)  explicitly, 
partition  X: 


(17) 


2 

2 

XX 

XY 

X 

2 

YX 

YY 

Then 

(18)  =  ^YX  ^XX  * 

In  (15)  substitute  T]^(t),  given  by  (l8),  for  r)^(t);  one  thus  obtains 
the  formula 
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(19) 


E[Y(t)|x(s),  s  <  t  and  Y(s),  s  <  t] 

=  Ayy( l)  Y(t-l)  +  ...  +  A^Cm)  Y(t-m) 

+  AyX(1)  X(t-l)  +  ...  +  AYX(m)  X(t-m)  +  r^(t)  . 

One  often  seeks  a  parsimonious  parameterization  of  the  filter  with 
output  Y  (°).  It  might  be  sought  through  stepwise  regression  among 
predictor  formulas  of  the  form 

x*(t)  =  0^(1).  Y(t-l)  +  ...  +  CjyCn)  Y(t-m) 

(20)  +  Cyx(l)  X(t-1.)  +  ...  +  Cyx(m)  X(t-m) 

+  2yrj  (l)  Tjy ( t~ 1)  +  ...  +  CyTi  (m)  ny(t-m)  . 

Once  a  relation  of  form  (20)  has  been  fitted,  it  can  be  computed  recur¬ 
sively. 

The  foregoing  models  correspond  to  time  domain  versions  of  predic¬ 
tor  formulas  for  Y(t)  in  which  no  rank  constraints  are  imposed  on  the 
matrix  coefficients.  It  would  be  of  interest  to  develop  time  domain 
versions  of  the  results  of  Brillinger  (1969)  on  predictors  with  rank 
restraints . 

(5)  A  final  point,  which  seems  to  me  the  most  important, 
multivariate  analysis  of  the  joint  innovation  covariance  matrix 

(21)  X  =  E[r)(t)  T)'(t)] 

provides  interesting  relations  among  components  of  the  time  series.  The 
use  of  regression  analysis  of  X  was  discussed  in  equations  (17)  -  (19). 
The  eigenvalues  and  eigenvectors  of  X  seem  worth  being  routinely 
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computed  and  examined  to  indicate  ways  of  reducing  the  dimensionality 
of  the  data  vector.  The  canonical  correlations  between  rj  and  T] 

X 

seem  to  have  meaningful  interpretations ,  such  as  for  testing  for  lack 
of  correlation  between  X(-)  and  Y(*)- 


2k 


5=  Autoregressive  Approach  to  a  Single  Time  Series 


When  modelling  a  single  time  series  X(-)>  one  is  interested  in 
describing  the  innovation  to  series  filter  [which  transforms  the  inno¬ 
vation  Tj(  • )  to  X(«)]  in  time  domain  terms  . 

Any  filter  can  be  approximately  expressed  as  a  combination  of  auto¬ 
regressive  and  moving  average  terms: 


(1) 


X(t)  +  a^  X(t-l)  +  ...  +  a^  X(t-m) 
=  T|(t)  +  1°1  ^(t-l)  +  •••  +  t>q  r](t-q) 


one  regards  the  orders  m  and  q,,  as  well  as  the  coefficients 
aqJ  a_^,  b^,  . . , ;  b^  as  parameters  to  be  estimated.  In  this 

formula^  it  is  usual  to  think  of  r)(.)  as  a  white  noise  series.  A 
minor  point  of  this  paper  is:  we  always  require  t)(»)  to  be  the 
innovation  series  of  X(«)« 

It  turns  out  that  assuming  the  model  for  X(»)  to  be  of  the  form 
(l)  is  wquivalent  to  assuming  a  model  for  the  one  step  linear  predictor 
X  (t)  of  the  form 


X*(t)  =  a1  X(t-l)  +  ...  +  am  X(t-m) 


(2) 

+  b1  ii(t-l)  +  .  . . 

+  b^  Ti(t-q)  . 

When 

X*  (t) 

satisfies  model  (2)  with;  (i) 

b..  =  . . .  =  b  = 

i  q 

0,  we  call 

x(  •) 

an  autoregressive  scheme  of  order  m, 

(ii)  a^  =  ...  = 

a  =  0,  we 
m 

call 

x(0 

a  moving  average  scheme  of  order 

q,  (iii)  some 

a '  s  and 

some 

b  '  s 

non-zero,  we  call  X(‘)  a  mixed 

s  cheme . 

In  other  words ,  the  usual  models  considered  for  stationary  time 
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series  can  equivalently  be  formulated  as  models  for  the  predictors 
X*(t)  . 

In  modelling  time  series  our  aim  is  to  obtain  a  parsimonious 
parameterization  of  the  form  (2).  There  are  available  several  methods 
for  estimating  the  parameters  of  the  mixed  scheme  (2);  see  Box  and 
Jenkins  (1969);  Durbin  (l959)j  (i960),  Walker  (1961),  (1962),  (1967), 
and  Philips-  (1967).  Possibly  a  new  variant  is  the  following  method. 
First ,  fit  X(°)  by  a  long  autoregressive  scheme 

(3)  X(t)  =  cL  X(t-l)  +  ...  +  cM  X(t-M)  +  rj ( t )  . 

A  A 

Efficient  estimators  c^,  . ..,  c^  of  the  coefficients  of  autoregres¬ 
sive  scheme  can  be  computed  (at  little  computational  costs  by  recursive 
methods) . 

Second,  consider  the  transfer  function 

(*0  C(z)  =  1  -  cpz  -  c2z2  -  •••  -  cy?M 

and  form  its  estimator 

(5)  C(z)  =  1  "  cLz  -  c2z2  -  ...  -  cmzM  . 

Third,  note  that  the  transfer  functions  which  we  seek  to  estimate 


(6)  A(z)  =  l-axz  -  ...-  amzrfl  , 

are  related  to  C(z)  by 


cl¥) 

The  parameters  of  A(z) 
least  squares)  from 


(z) 

A(z) 


and  B(z)  are 


B(z)  =  l+bnz  +  ...+b  z^- 

1  q 


C(z) 


A(z) 

b(7) 


to  be  estimated  (by  non  linear 
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(8) 


^  .  /  icoN 

C(eiaJ)  =  -^e.  ^  +  error(eiaJ) 

B(elW) 

where  the  error  term  is  a  time  series  (regarded  as  a  function  of  the 
index  to)  defined  by 

(9)  error(e'''CJ)  =  C^e^)  -  C(e'''tJ) 

Point  Ten  of  this  paper  is:  it  can  be  shown  that  the  error  series 

(9)  can  be  regarded  as  asymptotically  uncorrelated  at  different  fre¬ 
quencies  and  with  variance 

(10)  VarfCfe1")]  =  §]C(ei“)|2 
which  is  easily  estimated. 

To  motivate  (10)  let  us  note  that  one  may  regard  the  estimated 
autoregressive  coefficients 

(L1)  .  G]_;  V  SM 

as  a  "covariance  stationary  time  series"  with  means  c^,  and 

spectral  density  function 

(12)  fc(w)  =  ^  57  |c(eitU)|2  . 

By  the  theory  of  the  periodogram 

E|c(elW)  -  C(eiUJ)  |2  =  E  1  £  (ck-ck)eltJk|2  =  2nM  fc(to) 

k=l 

(13) 

=  |lc(elu)|2 

* ,  ico, 

and  the  values  C(e  )  at  different  frequencies  are  asymptotically 


27 


uncorrelated. 


Point  Eleven  of  this  paper  is:  fitting  a  suitably  long  autore¬ 
gressive  scheme  to  a  univariate  stationary  time  series  is  a  possible 
method  of  spectral  density  estimation  [especially  when  one  assumes  that 
there  are  no  lines  in  the  spectrum]. 

The  usual  type  of  estimator  of  the  normalized  spectral  density  f(w) 
[where  f(to)  =  f(w)/R(0)  ]  of  a  stationary  time  series  is  a  filtered 
sample  spectral  density  of  the  form 


(14)  fT,M(u)  -  S|^<I">SWt(S,PlW 

where  M  is  a  suitable  integer  (called  the  truncation  point)  and  k( • ) 
is  a  suitable  kernel.  There  is  an  extensive  literature  on  how  to  choose 
M  and  k(u)  [see  Parzen  (1967b)  and  (1967c)]. 

It  appears  that  an  alternative  estimator  which  is  bias  free  and 
has  similar  variability  is  the  autoregressive  spectral  estimator 
defined  by 


(15) 


f  (w) 


_1_  £2 
2n  °M 


M 

I  \ 


k=l 


-itok  -2 
e 


While  the  idea  of  estimating  the  spectral  density  by  first  fitting  an 
autoregressive  scheme  has  been  alluded  to  in  the  literature,  there  has 
been  no  treatment  of  its  asymptotic  variance.  A  variability  theory  is 
now  being  developed  by  Parzen  (1969).  The  properties  of  (15)  are  dis¬ 
cussed  at  the  end  of  the  next  section  in  the  context  of  the  multivariate 
case . 


Finally  we  briefly  discuss  the  question  of  how  to  determine  the 
order  M  of  a  suitably  long  autoregressive  scheme  to  fit  to  a  sample 
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(X(t),  t  =  1,  2, 


T}  of  a  zero  mean  covariance  stationary  time  series. 


For  each  order  m,  one  can  recursively  form:  (i)  estimators 

/\  /N 

a  ,  a  of  the  coefficients  of  the  predictor  of  finite  memory  m 

■Li  in  m^m 


(16)  E[x(t)|x(t-l),  .0.,  X(t-m)  ]  -  aL  X(t-l)  +  ...  +  a  X(t-m)  , 


and  (ii)  the  sample  innovation  variance  of  order  m 


(17) 


■tf2  =  {Pm(0)  -  a  Pm(l) 

m  T'  .  l,m 


a  p„(m) }  R_(0) 
m^m  Tx  T 


■where  p,p(v)  is  the  sample  correlation  function.  Define 


(18) 


m 


T  log  a  : 

m 


it  is  increasing  as  a  function  of  m  =  1,  2,  . T  -  1  and  asymptotes 
to  \  =  -  T  log  tf  .  "where  'ey  is  the  infinite  memory  prediction  error 
variance , 

A  procedure  for  choosing  an  appropriate  order  M  such  that  X(*) 

can  be  regarded  as  an  autoregressive  scheme  of  order  M  is:  Choose  M 

to  be  the  smallest  value  of  m  such  that  X,  -  X,  is  less  than  the  955^ 

c»  m 

significance  value  of  the  Chi-square  distribution  -with  T-m  degrees  of 
freedom.  Extensive  investigation  is  needed  on  the  theory  and  application 
of  this  procedure . 

This  suggestion  can  be  roughly  justified  by  the  theory  of  likelihood 
ratio  tests  of  the  hypothesis  that  the  series  satisfies  an  autoregressive 
scheme  of  order  m  versus  the  alternative  hypothesis  that  it  satisfies 
an  autoregressive  scheme  of  order  T  -  1  [see  Whittle  (1952)  or 
Whittle’s  appendix  to  Wold  (195^-)  ]° 

It  seems  to  me  also  justified  from  the  likelihood  point  of  view 
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A  A 

since  the  likelihood  of  the  data  under  the  parameters  a^  ^  ^ 

can  be  considered  to  be  not  ''significantly"  different  from  the  maximum 


likelihood  of  the  data  (which  is  a  monotone  function  of  X  ) . 
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6 .  Multiple  Spectral  Density  Estimation 

In  this  section  we  discuss  autoregressive  approaches  to  estimating 
the  spectral  density  matrix 


(1) 


f(w)  = 


'f11(w)  ••• 


friM 


.  o  f  (t0) 
rrv 


Traditionally  one  estimates  f(to)  by  estimating  each  entry 
f  (to)  by  a  filtered  sample  cross-spectral  density 

XI J 


(2) 


fhj;T,M(w)  2«  |V|^<T  6 


lvto  ,  /V 


k(rr)  R,  .  m(v) 
'M'  hj  jTv  ' 


■where  R^.^,  is  the  sample  cross-covariance  function.  Except  for  ease 
of  developing  the  distribution  theory  of  the  estimator  f(co),  there 
seems  to  be  no  reason  -why  one  should  use  the  same  truncation  point  for 
each  component  f  (to) . 

A  method  of  letting  the  data  determine  an  appropriate  truncation 
point  for  each  component  is  to  estimate  it  via  a  sample  analogue  of  the 
formula 


(3) 


=  I  -  V“>  - 

h  0  n  j 


*2  1\+iX.(“)  -  -  fX.(“)J 


Each  univaiiate  spectral  density  -which  appears  in  this  formula  could  be 
estimated  by  autoregressive  spectral  estimation,  although  further 
research  is  needed  on  the  theory  of  the  complex  valued  univariate  series 
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y> 

In  the  literature  of  empirical  spectral  analysis,  remarks  are 
frequently  made  about  the  value  of  pre-whitening  or  prefiltering.  It 
has  long  been  my  view  that  prewhitening  is  of  value  but  only  when 
guided  by  model  building.  An  important  approach  to  estimation  of  the 
spectral  density  matrix  which  could  be  said  to  use  prewhitening  is  as 
follows . 


First,  generate  the  individual  innovations  r) .  ( • )  of  each  time 

series  X.(«)  by  fitting  to  it  a  suitably  long  autoregressive  scheme: 
1 


(4) 


Tl.(t)  =  X  (t)  -  cU  x  (t-1)  - 

d  d  d 


X.(t-M.) 
M.  j' 

d 


Second,  by  some  method  of  spectral  density  matrix  estimation  form 
estimators  of  the  spectral  density  matrix  {f  ( cu) }  of  the  multiple 

Vj 

time  series  of  individual  innovations . 


Third,  estimate  f  (w)  by 

Vj 


(5) 


I1-'?1'"1”- 


it  T)  • 
'h  'j 


(1  -  cp>eiU 


(h) 

-  c'  e 
“h 


■  icon. 


( icoM  .  , 

_  c  ^  e  ^  } 

M.  6  J 
J 


A  method  of  spectral  density  matrix  estimation  whose  value  remains 
to  be  investigated  is  via  the  joint  autoregressive  estimator  which  is 
formed  by  first  fitting  a  vector  autoregressive  scheme  to  the  multiple 
time  series  X(°)*  To  conclude  this  paper  we  describe  the  main  features 
of  this  approach. 

In  the  notation  introduced  in  Section  2,  the  joint  autoregressive 
cross-spectral  density  matrix  estimator  is  defined  by 
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(6) 


2*/“)  =  ^  U(-o>)  Z"1  A(a))'}-1 

where 

(7)  A(to)  =  I  -  A(l)e“iaJ  -  ...  -  A(M)e"lWM  . 

The  order  M  used  'would  be  determined  by  a  goodness  of  fit  test 
for  joint  autoregressive  schemes  fitted  to  a  multiple  time  series  [see 
Whittle  ( 1953 ) ] •  Here  we  are  interested  only  in  the  variability  of  the 

A 

estimator  f  (to)  under  the  assumption  that  X(-)  is  a  zero  mean 
stationary  multiple  time  series  satisfying  a  joint  autoregressive  scheme 
of  order  M. 

Research  is  in  progress  to  prove  (as  rigorously  as  possible)  that 
for  0  <  to  <  si,  with 


(8) 


v 


T  1 
M  2  * 


/\ 

V  f  ^  (to)  has  a  complex  Wishart  distribution  of  dimension  r ,  degrees 
of  freedom  V }  and  covariance  matrix  f_  (to) . 

The  interpretation  of  this  result  is  that  the  variability  properties 

A 

of  the  autoregressive  cross-spectral  estimator  f^to)  are  "t*16  same  as 
those  of  the  filtered  sample  spectral  density  matrix  with  rectangular 
kerne  1 


(9) 


k(u) 


1  ,  0  <  |u|  <  1 

0  ,  |  u  1  >  1 


f  CO  2  /  \ 

for  which  J  k  (u)du  =  2. 

-oe  v  ' 

Some  advantages  of  the  autoregressive  cross-spectral  estimator 
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seem  to  me  to  be : 


(1)  No  window  is  involved  in  forming  f^(to),  so  we  avoid  a 
debate  as  to  the  choice  of  window. 

(2)  The  truncation  point  M  can  be  chosen  on  the  basis  that  the 
multiple  time  series  X_(  • )  passes  a  goodness  of  fit  test  for  obeying 
a  joint  autoregressive  scheme  of  order  M. 

(3)  Under  the  assumption  that  the  multiple  time  series  X(0  obeys 
a  joint  autoregressive  scheme  of  order  M,  the  autoregressive  cross- 
spectral  estimator  has  much  smaller  bias  than  filtered  sample  cross- 
spectral  density  estimators.. 

(4)  Autoregressive  cross-spectral  estimators  are  easily  updated 
for  additional  observations  and  therefore  lend  themselves  to  adaptive 
estimation  [compare  Jones  (1966)]. 
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